Learning Bayesian Networks Using Feature Selection
نویسندگان
چکیده
This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, high-predictive-accuracy networks. The new approach selects a subset of features that maximizes predictive accuracy prior to the network learning phase. We examine explicitly the eeects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks that are computationally simpler to evaluate and display predictive accuracy comparable to that of Bayesian networks which model all attributes. 28.1 Introduction Bayesian networks are being increasingly recognized as an important representation for probabilistic reasoning. For many domains, the need to specify the probability distributions for a Bayesian network is considerable, and learning these probabilities from data using an algorithm like K2 Cooper92] could alleviate such speciication diiculties. We describe an extension to the Bayesian network learning approaches introduced in K2. Our goal is to construct networks that are simpler to evaluate but still have high predictive accuracy relative to networks that model all features. Rather than use all database features (or attributes) for constructing the network, we select a subset of features that maximize the predictive accuracy of the network. Then the learning process uses only the selected features as nodes in learning the Bayesian network. We examine explicitly the eeects of two aspects of the algorithm: (a) feature selection, and (b) node ordering. Our experimental results verify that this approach generates networks that are compu-tationally simpler to evaluate and display predictive accuracy comparable to the predictive accuracy of Bayesian networks that model all features. Our results, similar to those observed by other studies of feature selection in learning Caruana94, John94, Langley94a, Langley94b], demonstrate that feature selection provides comparable predictive accuracy using smaller networks. For example, by selecting as few as 18% of the features for the
منابع مشابه
Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملBayesian Network Learning with Discrete Case-Control Data
We address the problem of learning Bayesian networks from discrete, unmatched case-control data using specialized conditional independence tests. Those tests can also be used for learning other types of graphical models or for feature selection. We also propose a post-processing method that can be applied in conjunction with any Bayesian network learning algorithm. In simulations we show that o...
متن کاملBuilding Interpretable Models: From Bayesian Networks to Neural Networks
This dissertation explores the design of interpretable models based on Bayesian networks, sum-product networks and neural networks. As briefly discussed in Chapter 1, it is becoming increasingly important for machine learning methods to make predictions that are interpretable as well as accurate. In many practical applications, it is of interest which features and feature interactions are relev...
متن کاملSelecting Features by Learning Markov Blankets
In this paper I propose a novel feature selection technique based on Bayesian networks. The main idea is to exploit the conditional independencies entailed by Bayesian networks in order to discard features that are not directly relevant for classification tasks. An algorithm for learning Bayesian networks and its use in feature selection are illustrated. The advantages of this algorithm with re...
متن کاملSpam Detection in Social Networks Using Correlation Based Feature Subset Selection
Bayesian classifier works efficiently on some fields, and badly on some. The performance of Bayesian Classifier suffers in fields that involve correlated features. Feature selection is beneficial in reducing dimensionality, removing irrelevant data, incrementing learning accuracy, and improving result comprehensibility. But, the recent increase of dimensionality of data place a hard challenge t...
متن کامل